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Abstract 

Background: The sheer amounts of biological data that are generated in recent years have driven the development 
of network analysis tools to facilitate the interpretation and representation of these data. A fundamental challenge in 
this domain is the reconstruction of a protein-protein subnetwork that underlies a process of interest from a 
genome-wide screen of associated genes. Despite intense work in this area, current algorithmic approaches are 
largely limited to analyzing a single screen and are, thus, unable to account for information on condition-specific 
genes, or reveal the dynamics (over time or condition) of the process in question. 

Results: We propose a novel formulation for the problem of network reconstruction from multiple-condition data 
and devise an efficient integer program solution for it. We apply our algorithm to analyze the response to influenza 
infection and ER export regulation in humans. By comparing to an extant, single-condition tool we demonstrate the 
power of our new approach in integrating data from multiple conditions in a compact and coherent manner, 
capturing the dynamics of the underlying processes. 

Keywords: Protein-protein interaction networks. Graph algorithms. Integer linear programming 



Background 

With the increasing availabiUty of high-throughput data, 
network biology has become the method of choice for 
filtering, interpreting and representing these data. A fun- 
damental problem in network biology is the reconstruc- 
tion of a subnetwork that underlies a process of interest 
by efficiently connecting a set of implicated proteins (e.g. 
derived by some genome-wide screen) in a network of 
physical interactions. In recent years, several algorithms 
have been suggested for different variants of this problem, 
including the Steiner tree based methods of [1,2], the flow 
based approach of [3] and the anchored reconstruction 
method of [4]. 

Despite the plethora of network reconstruction meth- 
ods, these have been so far largely limited to explaining a 
single experiment or condition. In practice, the network 
dynamically changes over time or conditions, calling for 
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reconstructions that can integrate such data to a coher- 
ent picture of the activity dynamics of the underlying 
pathways. 

Here we tackle this multiple-condition scenario, where 
the reconstructed subnetwork should explain in a coher- 
ent manner multiple experiments driven by the same set 
of proteins (referred to here as anchor proteins) while pro- 
ducing different sets of affected proteins, or terminals. 
As in the single-condition case, a parsimonious assump- 
tion implies that the reconstructed subnetwork should be 
of minimum size. In addition, we require that its path- 
ways, leading from the anchors to each of the terminals, 
are as homogeneous as possible in terms of the condi- 
tions, or labels they span. We formulate the resulting 
minimum labeling problem, show that it is NP complete 
and characterize its solutions. We then offer an equivalent 
formulation that allows us to design a polynomial integer 
linear programming (ILP) formulation for its solution. We 
implement the ILP algorithm, M/<X, and apply it to two 
datasets in humans concerning the response to influenza 
infection and ER export regulation. We show that the 
MKL networks are significantly enriched with respect 
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Figure 1 The optimal MKL solution for a = 0.5 is neither the 
union of label-specific Steiner trees nor a subgraph of it. In this 
instance k = 2,T] = {x,y} and T2 = {y,z}. The optimal Steiner trees 
for T] and T2 are composed of the red (dashed) and blue (solid) edges, 
respectively. The best MKL solution that uses only edges of the union 
can be achieved by pushing label 1 over the red edges and 2 over the 
blue edges, resulting in 14 labels and 14 edges. In contrast, the 
optimal solution, whose labels appear on top of the figure, contains 
the blue and green (waved) edges, spanning 15 labels and 9 edges. 



to the related biological processes and allow obtaining 
of novel insights on the modeled processes. Finally, we 
compare MKL with an extant method, ANAT [4], demon- 
strating the power of our algorithm in integrating data 
from multiple conditions in a compact and informative 
manner. 

Preliminaries 

Let G = be a directed graph, representing a 

protein-protein interaction (PPI) network, with vertex set 
V and edge set £, and leta e V be an anchor node. Denote 
by In (v) {Out(v)) the set of incoming (outgoing) edges of 
a node v e V, respectively. Let L = {1, . . . , /c} be a set of 
labels, representing k > 1 conditions. Let/ : E 2^^^ 
be a labeling function that assigns each edge of £ a (pos- 
sibly empty) subset of labels. For 1 < i < /c, we define 
Ei(f) := {e G E : i e f{e)} to be the set of edges 
with label We further denote /^(v) = Ue€/w(v)/^^) 

foutiy) = ljeeOut{v)f(^)' 

We say that a labeling function/ is valid if for every ter- 
minal t and condition / in which t is affected, there exists 
a path from a to t whose edges are restricted to £/(/"), or 
in other words, are assigned with the label /. We evaluate 
the cost of the labeling according to the number of labels 
L(f) used and the number of edges N(f) that are assigned 
with at least one label. Formally, L(f) = J2eeE \f(^)\ 
N(f) = \{e e E : f(e) 7^ 0}|. The cost is then defined as 
a •L(f)-h(l — a) ' N(f), where 0 <a <1 balances the two 
terms. 

We study the following minimum /c-labeling (MKL) 

problem on G: The input is an anchor node a e V and 
k > 1 sets of terminals Ti, Tj^ in V \ {a} that implic- 
itly assign to each terminal the subset of conditions (or 
labels) in which it is affected. The objective is to find a 
valid labeling of the edges of G of minimum cost. 

Clearly, any valid labeling induces a subnetwork that can 
model the given conditions: this subnetwork is comprised 
of those edges that are assigned a non-empty subset of 
labels. We note that for k = 1 we have L(f) = N(f), 
thus in this case the MKL problem is equivalent to the 
minimum directed Steiner tree problem. The parameter 
a balances between two types of solutions: (1) a subnet- 
work with minimum number of labels {a = 1), which 
is equivalent to the union of independent Steiner trees 
for each of the conditions, and (2) a subnetwork with 
minimum number of edges {a = 0), which is simply a 
Steiner tree spanning the terminals in the union of all 
conditions. However, general instances of MKL where 
Of 7^ 0, 1 can be solved neither by combining the inde- 
pendent Steiner trees of each of the conditions nor by 
constructing a single Steiner tree over all terminals. This 
is illustrated by the toy examples in Figures 1 and 2. Next, 
we provide a characterization of solutions to the MKL 
problem. 



Theorem 1. Given a solution labeling f to an MKL 
instance, let G/ denote the subgraph ofG that is induced by 
the edges in Ei(f). Then Gi is a directed tree rooted at a. 

Proof. By definition, there is a directed path in G/ 
from a to each of the terminals in T/. Clearly, any edge 




Figure 2 The optimal MKL solution for a = 0.6 is not a minimum 
Steiner tree over all terminals. In this instance k = 2,T] = {x, w} 
and 72 = {y, A- The black (solid) edges form a Steiner tree with 6 
edges and 8 labels, whereas the blue (dashed) edges constitute an 
MKL solution with 7 edges and 7 labels. 
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directed into a can be removed without affecting the 
constraints of a valid solution. Thus, it suffices to show 
that the underlying undirected graph of G/ contains no 
cycles. By minimality of the solution, every vertex in G/ 
is reachable from a or else it can be removed along with 
its edges. Suppose to the contrary that vi, . . . , is a 
cycle in the underlying graph. Since a cannot be on this 
cycle and by the above observation, each of the cycles 
vertices is reachable from a, W.l.o.g., let vi be the far- 
thest from a in G/ among all cycle vertices. Then one 
can obtain a smaller solution by removing one of the 
edges (vi, V2), (v^, vi) (depending on their orientations), a 
contradiction. □ 

As noted earlier, when k = 1 the MKL problem is 
equivalent to the minimum directed Steiner tree problem, 
which is known to be NP-complete [5]. A simple reduction 
from this case yields the following result: 

Theorem 2. The MKL problem is NP- complete for every 
k>l. 

Proof Let k > 1. Given an instance of the mini- 
mum 1-labeling problem, that is, a network G = (V,E), 
an anchor a e V and a single set of terminals T c 
V, we generate the following input to the minimum k- 
labeling problem. Define the background network G^ = 
(y^£0> where V' = V U . . . , and E' = E \J 
{{a, ti) , . . . , {a, tk-i)], where {^/l^"^ are new nodes not in 
V, The input k sets of terminals are then T, {^i}, . . . , {tk-i}, 
and the anchor remains a. The key observation to com- 
plete this proof is that an optimum solution to the reduced 
instance must include all edges {a, ti), plus an optimal 
tree that connects a to the terminals in T using a single 
label. □ 

Methods 

An alternative formulation of MKL 

As the MKL problem is NP-complete, we aim to design 
an integer linear program for it, which will allow us to 
solve it to optimality or near-optimality for moderately- 
sized instances. In order to design an efficient ILP, we first 
provide an alternative formulation of the MKL problem, 
expressed in terms of units of flow per label pushed from 
the anchor toward the terminals. To this end, we extend 
the labeling definition to support assignment of multi- 
sets, as described below. We denote a multi-set by a pair 
M = {S, jji), where 5' is a set and /jl: S ^ Z+. We say that 
X e M if X e S. We let \M\ denote the cardinality of the 
underlying set S. 

The union l±) of two multi-sets {82,112) is 

defined as the pair {S.jji), where S = SiD S2; for every 
X e Si n S2y fiix) = fiiix) + /X2(^); for x e Si \ S2, 
Ijl(x) = iJii(x); and for x e S2 \ Si, iJi(x) = /X2(^). We 



extend the definitions oi fniv) and foutiy) to multi-sets 
using this union operator. Finally, for a vertex v ^ a\^e let 
L{v) = {i e L : V e T/}; note that for non-terminal nodes 
I(v) = 0. 

The alternative objective formulation is as follows: Find 
a multi-set label assignment g : E ^ 2^^^ that satisfies the 
following constraints: 

(i) goutia) = {L, 11), where /x(0 = \ Ti \ for every / e L, 
(The total amount of flow that goes out from the 
anchor per label equals the number of terminals that 
belong to the corresponding experiment). 

(ii) For every v ^ a, gin(v) = goutM W I(v). 

(For each label i, the incoming flow of a node v 
equals its outgoing flow, incremented by 1 if y is a 
terminal expressed in experiment i). 

(iii) Denote L(g) = j:^^E\g(e)l 

N(g) = \{eeE: g(e) 7^ 0}|, and let 0 < of < 1. Then 
a • L(g) + (1 — Of) • N(g) is minimal. 

We claim that the two formulations are equivalent. Given 
a multi-set labeling^, it is easy to transform it into a label- 
ing/ by taking at each edge the underlying set of labels. 
One can show that the labeling/ is valid, i.e. for each / 
there are paths in Ei(f) that connect a to each of the ter- 
minals in T/. For the other direction, given a labeling/ we 
can transform it into a multi-set labeling^ by defining the 
multiplicity of a label i at the edge (w, v) e Eiif) as the 
number of terminals from Ti in the subtree of G/ that is 
rooted at v. It is easy to see that all constraints are satisfied 
by this transformation. 

The above problem formulation can be made stricter by 
requiring that the set of incoming labels to a terminal is 
exactly the set of labels associated with the terminal. That 
is, for every terminal t and / G L\ L(t), we require that 
/ ^ ginif)' Our ILP formulation includes this requirement 
in order to better reflect the experimental observations, 
though in practice both versions produce very similar 
results. 

An ILP algorithm 

In order to formulate the problem as an integer program, 
we define three sets of variables: (i) binary variables of 
the form y^, indicating for every e ^ E and / G L whether 
the edge e is tagged with label /; (ii) integer variables 
of the form x^^, indicating for every e e E and / e L the 
multiplicity of label / (in the range 0 to \Ti\); and (iii) 
binary variables of the form Zg, indicating for every e e E 
whether the edge e participates in the subnetwork (car- 
rying any label). For a vertex v e V, let b\, be a binary 
indicator of whether / g L(v) or not. Let a be some fixed 
value in the range [0,1]. The formulation is as follows 
(omitting the constraints on variable ranges): 
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min a • ^ j/^ + (1 — of) 

eeE,ieL 

s.t: 

eeOut(a) 

E 4= E < + 

eeln(v) eeOut(v) 

By Theorem 1, the constraint 



We e E,i e L 
We e E,i e L 
WieL 

yveV\ [a], i e L 
Wte TJ^L(t) 



eeln(v) 



(1) 
(2) 
(3) 

(4) 
(5) 



(6) 



can be added to the ILP without affecting the optimal 
solution. The following Lemma leverages this insight for 
enhancing the ILP performance by removing some of the 
integrality constraints. 

Lemma 1. Assume that constraint (6) is added to the ILP 
formulation above. If all y^^'s are restricted to binary val- 
ues then the range constraints x^^ g[ 0, | T/l] and Ze g[ 0, 1] 
guarantee that all x^^s and ZqS are assigned integer values 
in any optimal solution. 

Proof Let v ^V, We first prove that for every e e In(v) 
and / G I, x^ must be an integer. By the new constraint (6) 
and the integrality of all s, the sum J2eein(v) either 
0 or 1. If it is 0 then by constraint (1), for each of these 
edges x^ = 0. Otherwise, exactly one of these edges has 
= 1 and therefore x^ > 0. Denote by G/ the subnet- 
work that is induced by all edges having nonzero flow for 
label / (i.e. edges e fulfilling x''^ > 0). Denote by Ti(v) the 
set of terminals in Ti that are reachable from v in G/, and 
let t G T/(v). By applying the above argument for each 
of the nodes between v and t, we infer that there is a sin- 
gle path that carries flow from v to ^ in G/, and that all of 
ts incoming flow (of label i) must pass through v. Every 
t G Ti{v) absorbs a flow of 1 and therefore from the flow- 
preserving constraint (4), XleG/w(v) K — l^/WI - The other 
direction holds too since the flow of label / that v sends can 
be collected only by terminals in Ti(v), Thus, we conclude 
that all x^s in this sum equal 0 except for a single element 
which equals | Ti(v)\, i.e. all of them are integers. 

To prove that all ZeS are integral, consider some edge 
e e EAi there exists / g L such that = 1 then from con- 
straint (2) it follows that = 1. Otherwise, the equality 
Ze = 0 follows from the minimality of the solution. □ 

Heuristic data reduction and runtime analysis 

Since solving an ILP is a time consuming task, we devised 
a heuristic method for filtering the input network, aiming 
to capture those edges that the MKL optimal solution is 
more likely to use. Specifically, we focused on (directed) 



edges that lie on a near shortest path - up to <i edges 
longer than a shortest path - between the anchor and any 
of the terminals. 

In order to support this heuristic and find a value for d 
that achieves a satisfying balance between running time 
and optimality, we tested the performance of our ILP 
algorithm on the influenza dataset (which is the more 
computationally expensive dataset described in the Exper- 
imental results Section) with d = 0, d = 1, d = 2, 
and without the heuristic filtering. These parameter val- 
ues induced input background networks of O.Olx, 0,lx, 
0.5x and x edges, respectively, where x ~ 80, 000 is the 
complete network size. Using d = 1, six hours were suf- 
ficient to achieve an optimal solution of cost (combined 
number of labels and edges) 275. Using <i = 2, a solution 
of similar quality (cost 272) was achieved after 48 hours. 
This execution also proved that the optimal solution with 
d — 2 has a lower bound of at least 262, showing that 
the theoretical improvement over <i = 1 is limited to less 
than 5%. This analysis motivated our selection of <i = 1 
for the experimental evaluation that follows. Further, it is 
interesting to note that with this choice, the convergence 
toward the optimum is very fast: in three hours one could 
achieve a solution that is less than 1% behind the opti- 
mum (though this time period was not enough to prove 
this approximation guarantee). This is in large contrast to 
the settings of <i > 2 that are characterized by very slow 
convergence (>10% approximation ratio after 24 hours). 
The results are summarized in Figure 3. 




- d=0: achieved sol. 

- d=0: lower bounds 

- d=1: achieved sol. 

- d=1: lower bounds 

- d=2: achieved sol. 

- 6=2: lower bounds 

- d=lnf: achieved sol. 

- d=lnf: lower bounds 




3 6 12 24 48 

Running time (liours) 

Figure 3 Dependency of the MKL algorithm performance on the 
heuristic filtering. This figure compares tine performance of tine MKL 
algoritlim witli respect to different values of tlie lieuristic data 
reduction parameter d and wlien setting different limits on the 
running time. For each value of d (0, 1, 2 or no heuristic filtering), two 
plots with the same color are displayed: the top plot (triangle 
symbols) shows the cost of the achieved solution after the specified 
number of hours (or less); the bottom plot (circles) shows the best 
proven theoretical lower bound on an optimal solution as reported 
by the same execution. Note that for d = 0 these two (black) plots 
fully coincide. 
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Performance evaluation 

We used the commercial IBM ILOG CPLEX optimizer 
to solve the ILP and instructed it to accept approximate 
solutions that deviate by at most 5% from the optimum, 
enabling our executions to end within less than two hours. 

We evaluated a solution subnetwork using both 
network-based and biological measures. The network- 
based measures included the number of labels, number 
of edges and a homogeneity score. To compute the homo- 
geneity score of a node v, we examined the frequencies 
of all subsets of labels assigned to terminals under v. The 
score of v was defined as the highest frequency found 
divided by the number of terminals under v. The homo- 
geneity score of the subnetwork was then defined as the 
average over all nodes that span at least two terminals. To 
quantify the biological significance of the reconstructed 
subnetworks, we measured the functional enrichment of 
their internal nodes (non-input nodes) with respect to val- 
idation sets that pertain to the process in question. In 
addition, we provide expert analysis of the subnetworks. 

We compared the performance of our method to that 
of the state-of-the-art AN AT reconstruction tool [4], 
which was shown to outperform many existing tools in 
anchored reconstruction scenarios. For each dataset, we 
applied ANAT (with its default parameters, and with- 
out the heuristic filtering) to each condition separately, 
then unified the results to get an integrated subnetwork. 
We labeled the solution straightforwardly: an edge e was 
labeled / if e participated in the subnetwork that was con- 
structed for condition /. We also compared our results to 
those attained by computing a Steiner tree over the ter- 
minals of all conditions together, implemented using the 
same ILP algorithm by setting of = 0. 

Experimental results 

We tested the performance of our algorithm on two 
human datasets concerning the cellular response to 
the influenza virus and ER export regulation. The two 
datasets were analyzed in the context of a human PPI net- 
work reported in [4] which contains 44,738 (bidirectional) 
interactions over 10,169 proteins. 

For each of the two datasets, we tested the robustness of 
our algorithm to different choices of the weighting param- 
eter a, observing that the number of edges and labels 
varied by at most 8% and 4%, respectively, over a wide 
range of values (0.25-0.75). Thus, we chose a = 0.5 for 
our analyses in the sequel. 

Response to influenza infection 

We used data on the response to viral infection by 
the HlNl influenza strain A/PR/8/34 CPR8') in primary 
human bronchial epithelial cells [6]. The dataset contains 
a collection of 135 virus-human PPIs and gene expres- 
sion profiles, measured at different time points along the 



course of the infection. We focused on four time points 
(the "conditions") t = 2, 4, 6, 8 (i.e. k = 4 labels), in each 
time point selecting those genes that were differentially 
expressed above a cutoff of 0.67 [6]. We did not include 
time points earlier than ^ = 2 or later than ^ = 8, as the 
former had no or very few differentially expressed genes, 
while the latter induced an order of magnitude larger 
gene sets that are presumably associated with secondary 
responses. 

We augmented the human network by the influenza- 
host PPIs and an auxiliary anchor node (named Virus') 
which we connected to the 10 viral proteins. After the 
heuristic filtering (using d = 1), the network contained 
1,598 proteins and 8,708 interactions. 

The four terminal sets contained 8,19,19 and 49 pro- 
teins, respectively, with 77 total in their union, out of 
which 57 were reachable from the anchor. The resulting 
MKL subnetwork, which is shown in Figure 4, contains 
127 edges over 123 nodes (117 human, 5 viral and the 
anchor node) with 60 internal (non-input) nodes. This 
subnetwork is much more compact than the solution sug- 
gested by ANAT, which contains 173 nodes out of which 
106 are internal. The subnetworks of MKL and ANAT are 
quite different in terms of node composition, having 31 
internal intersecting nodes. A summary of our network- 
based measures for the subnetworks predicted by our 
algorithm, ANAT, and the Steiner tree algorithm is given 
in Table 1. 

Next, we scored the enrichments of both subnetworks 
with viral infection related processes such as: viral repro- 
duction, intracellular receptor mediated signaling path- 
way and apoptosis. The MKL subnetwork was highly 
enriched with these processes, outperforming the ANAT 
and the Steiner subnetworks (Table 2). In the following we 
present a detailed analysis of the MKL inferred subnet- 
work and demonstrate its high predictive power and its 
ability to characterize viral proteins and host mediators in 
terms of their temporal effect on their targets. Specifically, 
we show that this subnetwork suggests that an imbalance 
in the timing of effect between viral proteins (e.g. Ml and 
NP) or between host mediators (such as SmadS and UBC) 
can reveal their different kinetics of influence on host pro- 
teins. This is in large contrast to the results produced 
by the ANAT tool, which does not provide any timing 
imbalance among downstream targets of viral proteins or 
host mediators (data not shown). 

We first present an example of an inferred path- 
way, selected to demonstrate our MKL approach. The 
m-RnfS-UBC- DAXX-MXl and NSl-SPlOO-MXl paths 
are a clear example of a predicted pathway that is well 
supported by extant experimental findings. It is con- 
sistent with the known role of both DAXX and SPlOO 
as major components of the PML bodies which con- 
trol together the localization of MXl in distinct nuclear 
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Figure 4 The MKL subnetwork for the influenza infection data. Terminal nodes are marked by their corresponding time point: t = 2 - yellow/ 
triangle; t = 4 - green/square; t = 6 - red/hexagon; t = 8 - gray/octagon; more than one time point - cyan oval nodes with thick border. The root is the 
artificial virus node and the first level is composed solely of viral proteins. 



components [7]. Further, DAXX is known to be regu- 
lated in vivo by ubiquitination through UBC and RnfS 
[8], supporting our placement of DAXX downstream to 
UBC, 

The MKL network shows that the targets of some 
human proteins have a common temporal behavior, 
whereas others have different downstream temporal 
responses. This is consistent with the fact that PPIs nat- 
urally represent different mechanisms that might differ 
in their kinetics. For example, the targets of Traf2 are 
mainly early responding genes whereas the targets of 



Table 1 Comparison of network-based measures between 
MKL, ANAT and the Steiner tree algorithm 


Measure 


MKL 


ANAT 


Steiner 


Influenza Infection 








No. of labels 


158 


254 


187 


No. of edges 


122 


186 


113 


Homogeneity score 


0.63 


0.58 


0.57 


ER export 








No. of labels 


152 


213 


163 


No. of edges 


145 


203 


144 


Homogeneity score 


0.88 


0.74 


0.81 



This table compares the subnetworks reconstructed by the MKL, ANAT and 
Steiner tree algorithms for the viral infecion and the ER export datasets with 
respect to the following measures: number of labels, number of edges and 
homogeneity score. 



Ccdc33 have longer temporal responses. The early effect 
of Traf2 is consistent with the findings that Traf2 is a 
signaling transduction kinase protein with fast kinetics. 
A similar characterization can be applied to other sig- 
nal transduction proteins such as SmadS, Conversely, the 
CcdcSS protein regulates its targets in late time points 
(6-8 hours) by an unknown mechanism. The results here 
suggest that this mechanism is orders of magnitude slower 



Table 2 Comparison of enrichments between the MKL, 
ANAT and Steiner tree solutions 



Biological process 


MKL 


ANAT 


Steiner 


Influenza Infection 








Intracellular receptor mediated 
signaling pathway (GO:0030522) 


6.5e-10 


2.1e-04 


1 .2e-05 


Apoptosis(GO:0006915) 


3.7e-04 


1 .7e-04 


3.3e-04 


Viral reproduction (GO:0016032) 


2.5e-03 


>0.01 


>0.01 


ER export 








Vesicle-mediated transport 
(GO:0016192) 


1 .2e-05 


7.6e-04 


8.5e-05 


Cellular membrane organization 
(GO:00 16044) 


1 .4e-05 


6.6e-05 


1 .6e-05 


Intracellular protein transport 
(GO:0006886) 


9.2e-06 


7.8e-06 


2.3e-05 



This table compares the hypergeometric p-values Indicating the significance of 
the overlap between each of the predicted subnetworks (considering non-input 
genes only) and the gene sets of GO categories that are of relevance to the 
investigated biological processes. 
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than phosphorylation. Similarly, the control of RnfS and 
UBC is expected to show fast kinetics through ubiquiti- 
nation. Nevertheless, we find that all the RnfS /UBC 19 
targets are controlled in late time points (6-8 hours), 
suggesting a novel temporal (late) control on the activ- 
ity of RnfS'Specific L/SC-based ubiquitination during the 
course of influenza infection. 

Regulation of endoplasmic reticulum (ER) export 

The journey of secretory proteins, which make up roughly 
30% of the human proteome starts by exit from the 
ER, Export from the ER is executed by so called COPII 
vesicles that bud from ER exit sites (ERES), A protein 
that is of central importance for ERES biogenesis and 
maintenance is Secl6A, a large (-250 kDa) protein that 
localizes to ERES and interacts with COPII components 
[9], We have recently performed a siRNA screen to test 
for kinases and phosphatases that regulate the func- 
tional organization of the early secretory pathway [10], 
Among the hits identified were 64 kinases/phosphatases 
that when depleted result in a reduction in the num- 
ber of ERES, Thus, these are 64 different potential 
regulators of ER export. More recently, a full genome 
screen tested for genes that regulate the arrival of a 
reporter protein from the ER to the cell surface [11], 
There, the depletion of 45 proteins was shown to affect 
ERES, However, whether the defect in arrival of the 
reporter to the cell surface was due to an effect on ER 



export or due to alterations in other organelles along 
the secretory route (e,g,, Golgi apparatus) remains to be 
determined. 

We applied MKL to these two screens, serving as 
two "conditions" highlighting different repertoires of 
ER export signaling-regulatory pathways. As the two 
screens do not intersect (most likely due to differences in 
read-outs), there were 109 terminals overall, 85 of them 
reachable in our human PPI network. Due to its cen- 
tral importance for ER export and ERES formation, we 
chose Secl6A as the anchor for this application. After the 
heuristic filtering, the network contained 1,907 nodes and 
11,329 edges. The resulting MKL subnetwork, which has 
145 nodes and 59 internal ones, is depicted in Figure 5, In 
comparison, the ANAT solution contains 190 nodes and 
104 internal ones (with 35 internal nodes common to the 
two solutions). As evident from Table 1, the MKL solution 
has a substantially lower cost and is more homogeneous. 

We assessed the functional enrichment of the MKL sub- 
network with biological processes that are of relevance to 
ER export such as cellular membrane organization, intra- 
cellular protein transport and vesicle-mediated transport. 
All three categories were highly enriched, and the /^-values 
attained compare favorably to those computed for the 
ANAT and the Steiner solutions (Table 2), 

Interestingly, 4 proteins of the MKL solution are related 
to autophagy (two of them internal nodes, p = 0,02), 
Autophagy is an endomembrane-based cellular process 




Figure 5 The MKL subnetwork for the ER export data. Terminal nodes are colored/shaped according to tine screen tliey were discovered in: [1 0] - 
yellow/triangle, and [1 1] - cyan/square. 
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that is responsible for capturing and degradation of sur- 
plus organelles and proteins. Links between ER export 
and autophagy have been proposed [12] but there is very 
limited mechanistic insight into this link. The vesicle- 
mediated transport process includes the STXl 7, SNAP29 
and ULKl proteins. The latter is a kinase that initiates the 
biogenesis of autophagosomes [1]. STXl 7 and SNAP29 
were recently proposed to be involved in autophagy by 
promoting the formation of ER-mitochondria contact 
sites and the fusion of autophagosomes with lysosomes 
[14,15]. As the MKL network was generated with ter- 
minals and an anchor that regulate ER export, we pro- 
pose that this approach could be used to identify the 
molecular link between secretion and autophagy in the 
future. 

Conclusions 

The protein-protein interaction network represents a 
combination of diverse regulation and interaction mech- 
anisms operating in different conditions and time scales. 
Integrating such data in a coherent manner to describe a 
process of interest is a fundamental challenge, which we 
aim to tackle in this work via a novel ILP-based mini- 
mum labeling algorithm. We apply our algorithm to two 
human datasets and show that it attains compact solu- 
tions that capture the dynamics of the data and align 
well with current knowledge. We expect this type of anal- 
ysis to gain further momentum as composite datasets 
spanning multiple conditions and time points continue to 
accumulate. 
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